Search CORE

12 research outputs found

Content-Based Weak Supervision for Ad-Hoc Re-Ranking

Author: Dietz Laura
Hui Kai
Li Bo
Sandhaus Evan
Strohman Trevor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

One challenge with neural ranking is the need for a large amount of manually-labeled relevance judgments for training. In contrast with prior work, we examine the use of weak supervision sources for training that yield pseudo query-document pairs that already exhibit relevance (e.g., newswire headline-content pairs and encyclopedic heading-paragraph pairs). We also propose filtering techniques to eliminate training samples that are too far out of domain using two techniques: a heuristic-based approach and novel supervised filter that re-purposes a neural ranker. Using several leading neural ranking architectures and multiple weak supervision datasets, we show that these sources of training pairs are effective on their own (outperforming prior weak supervision techniques), and that filtering can further improve performance.Comment: SIGIR 2019 (short paper

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty

Author: Brasoveanu Adrian
Guo Stephen
Hoffart Johannes
Horne Benjamin D.
Jana Abhik
Mikolov Tomas
Sandhaus Evan
Zheng Zhicheng
Publication venue
Publication date: 13/12/2018
Field of study

Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

arXiv.org e-Print Archive

Crossref

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

Author: Croft W Bruce
Dojchinovski Milan
Evan Sandhaus
Manning Christopher D.
Mihalcea Rada
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/05/2018
Field of study

This paper presents a Kernel Entity Salience Model (KESM) that improves text understanding and retrieval by better estimating entity salience (importance) in documents. KESM represents entities by knowledge enriched distributed representations, models the interactions between entities and words by kernels, and combines the kernel scores to estimate entity salience. The whole model is learned end-to-end using entity salience labels. The salience model also improves ad hoc search accuracy, providing effective ranking features by modeling the salience of query entities in candidate documents. Our experiments on two entity salience corpora and two TREC ad hoc search datasets demonstrate the effectiveness of KESM over frequency-based and feature-based methods. We also provide examples showing how KESM conveys its text understanding ability learned from entity salience to search

arXiv.org e-Print Archive

Crossref

Balancing Reinforcement Learning Training Experiences in Interactive Information Retrieval

Author: Berseth Glen
Ebrahimi Javid
Li Jiwei
Sandhaus Evan
Tang Zhiwen
Yang Grace Hui
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/06/2021
Field of study

Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22\% in dealing with unseen situations.Comment: Accepted by SIGIR 202

arXiv.org e-Print Archive

Crossref

Interoperable human behavior models for simulations

Author: Bharathy Gnana
Eidelson Roy J.
Hussain Talib
Lazarus Richard
Leung Alice
McDonald David
Pelechano Gómez Núria
Sandhaus Evan
Silverman Barry G.
Publication venue
Publication date: 01/01/2006
Field of study

Modern simulations and games have limited capabilities for simulated characters to interact with each other and with humans in rich, meaningful ways. Although significant achievements have been made in developing human behavior models (HBMs) that are able to control a single simulated entity (or a single group of simulated entities), a limiting factor is the inability of HBMs developed by different groups to interact with each other. We present an architecture and multi-level message framework for enabling HBMs to communicate with each other about their actions and their intents, and describe the results of our crowd control demonstration system which applied it to allow three distinct HBMs to interoperate within a single training-oriented simulation. Our hope is that this will encourage the development of standards for interoperability among HBMs which will lead to the development of richer training and analysis simulations.Postprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Growth of Sobolev norms for the analytic NLS on T-2

Author: Guàrdia Munarriz Marcel
Procesi M.
Sandhaus Evan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

We consider the completely resonant non-linear Schrödinger equation on the two dimensional torus with any analytic gauge invariant nonlinearity. Fix s>1. We show the existence of solutions of this equation which achieve arbitrarily large growth of Hs Sobolev norms. We also give estimates for the time required to attain this growth.Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Growth of Sobolev norms for the analytic NLS on T-2

Author: Guàrdia Munarriz Marcel
Procesi M.
Sandhaus Evan
Publication venue
Publication date
Field of study

RECERCAT

Interoperable human behavior models for simulations

Author: Bharathy Gnana
Eidelson Roy J.
Hussain Talib
Lazarus Richard
Leung Alice
McDonald David
Pelechano Gómez Núria
Sandhaus Evan
Silverman Barry G.
Publication venue
Publication date
Field of study

RECERCAT

A multicenter study to evaluate pulmonary function in osteogenesis imperfecta

Author: Bandi Venkata
Bober Michael B
Byers Peter H
Chen Shan
Consortium Members of the Brittle Bone Disorders
Cuthbertson David
Durigova Michaela
Glorieux Francis H
Grafe Ingo
Hart Tracy
Krischer Jeffrey
Lee Brendan
Mullins Mary
Nagamani Sandesh CS
Rauch Frank
Rush Eric T
Sandhaus Robert A
Schauer Evan
Shapiro Jay R
Smith Peter A
Steiner Robert D
Sutton Vernon Reid
Tam Allison
Publication venue: 'Wiley'
Publication date: 01/12/2018
Field of study

Pulmonary complications are a significant cause for morbidity and mortality in osteogenesis imperfecta (OI). However, to date, there have been few studies that have systematically evaluated pulmonary function in individuals with OI. We analyzed spirometry measurements, including forced vital capacity (FVC) and forced expiratory volume in the first second (FEV1 ), in a large cohort of individuals with OI (n = 217) enrolled in a multicenter, observational study. We show that individuals with the more severe form of the disease, OI type III, have significantly reduced FVC and FEV1 which do not follow the expected trends of the normal population. We also show that "normalization" of FVC and FEV1 using general population data to generate percent predicted values underestimates the pulmonary involvement in OI. Within each subtype of OI, we used linear mixed models to find potential correlations between FEV1 and FVC with the clinical variables including mobility, bisphosphonate use, and scoliosis. Our results are an important step in understanding the extent of pulmonary involvement in individuals with OI and for developing pulmonary endpoints for use in the routine patient care as well as in the investigation of new therapies

Crossref

eScholarship - University of California